Intake-ESM Integration based on #1218#2690
Conversation
…er.yml, skeleton of intake-esm inclusiion following #1218
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (94.56%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2690 +/- ##
==========================================
- Coverage 95.62% 95.61% -0.01%
==========================================
Files 266 267 +1
Lines 15601 15693 +92
==========================================
+ Hits 14918 15005 +87
- Misses 683 688 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bouweandela
left a comment
There was a problem hiding this comment.
Great to see progress on this @charles-turner-1!
esmvalcore/config-developer.yml
Outdated
| SYNDA: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}' | ||
| NCI: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}' | ||
| input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc' | ||
| catalogs: |
There was a problem hiding this comment.
The plan was to not further extend config-developer, but rather move this to the new configuration that lives in ~/.config/esmvaltool. See #2371 for an example of what we thought the configuration should look like.
esmvalcore/config-developer.yml
Outdated
| - /g/data/oi10/catalog/v2/esm/catalog.json | ||
| facets: | ||
| # mapping from recipe facets to intake-esm catalog facets | ||
| # TODO: Fix these when Gadi is back up |
There was a problem hiding this comment.
You could also test on DKRZ Levante, the intake catalogs are located at /pool/data/Catalogs/dkrz_cmip6_disk.json
esmvalcore/intake/_dataset.py
Outdated
| return ([_CACHE[cat_url] for cat_url in catalog_urls], facet_list) | ||
|
|
||
|
|
||
| class IntakeDataset(Dataset): |
There was a problem hiding this comment.
I'm having some reservations about subclassing the Dataset class for this purpose:
- A typical use case for many of our users will be that they have most data available from a central catalog that is managed by a central administrator, but want to augment that with the ability to download some files themselves. In that case, it is really useful to have the ability to deduplicate (e.g. pick the latest version of a file). I'm not sure if this can be achieved by subclassing the Dataset object.
- We will likely want to add support for other catalogs as well, e.g. intake-esgf, xcube, and STAC. If we need a new Dataset class for each of these, it may become confusing to users.
- How will this work from the recipe?
As an alternative, would it be an option to load the available data sources from the configuration / Dataset.session and then make the Dataset.files method loop over the available sources and deduplicate input files?
ESMValCore is quite flexible with what facets it accepts. We have a translation between some of 'our' facets and the official ones in the
If these are completely determined by the other facets, you can add them automatically using the extra facets facility |
How about adding a new module called e.g. |
|
Thanks for the review Bouwe, super helpful! I've only had a skim so far, but I'll get those suggestions incorporated next week |
|
I started working on adding some interface code that could be useful here too in #2765. |
|
Cheers, I'll take a look when I get the chance! Gonna talk to Martin Durant (author of Intake) in ~10 days so hopefully this PR should pick up stone steam after then, I'll be working on this stuff more actively. |
|
This should be a lot easier now that #2765 has landed. You could take the esmvalcore.io.intake_esgf module as an example and add a configuration file similar to data-intake-esgf.yml. |
|
@charles-turner-1 I popped the latest |
|
Been hoping to get back to this for a while... I just keep managing to find more urgent stuff to get in the way. Me & @rbeucher will be in Canberra together next week, so hopefully we can get a handle on our priorities then. |
|
Just a heads up that I'm getting back to this now - will reach out if I have any issues! |
|
|
Few lines of coverage to fix, but I think this is mostly ready for review now! |
Description
config-developer.ymlto include intake datasets.TODO:
tests/unit/test_dataset.py, or is it preferable to add a new test module? I'll hold off writing these until I work out the facets issue.intakesubmodule, but I could move it intodatasetif that's preferable? Also affects previous point.Have requested a review but obviously this is nowhere near ready to go on the infrastructure side wrt. tests, etc. A couple pointers in the right direction and that stuff should fly along.
Closes #31
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: